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A STRUCTURE THEOREM FOR SETS OF SMALL 
POPULAR DOUBLING 

PRZEMYSLAW MAZUR 


Abstract. In this paper we prove that every set A C Z satisfying the 
inequality * ^A{x),t) ^ (2 + S)t\A\ for t and S in suitable 

ranges, then A must be very close to an arithmetic progression. We 
use this result to improve the estimates of Green and Morris for the 
probability that a random subset A C N satisfies |N \ (A + A)| ^ fc; 
specifically we show that P(|N\ (A + A)| ^ k) = 0(2“^/^). 


1. Introduction 

Let us start with recalling Freiman {3k — 3) Theorem. It states that 
every hnite subset A C Z satisfying |A + A| < 3|A| — 3 is contained in an 
arithmetic progression of length |A + A| — |A| + 1. Comparing this with 
a lower bound |A + A| ^ 2|A| — 1 valid for all nonempty hnite subsets 
of Z, we can see that this result describes sets for quite large range of 
values of |A + A|. Our goal is to give a similar result for a set with a few 
popular sums. Note that it cannot be done dirctly; the reason is that the 
set Sk{A) = {a: G Z : |An(a; —A)| k} oi fc-popular sums is empty if fc ^ 3 
and A is a highly independent set. Instead, we need to consider a different 
quantity, namely the average size of Sk for 1 ^ k ^ t, which also appeared 
quite natural to Pollard in his work |Pol74j back in 1974. 

At this point it is convenient to use the notation of convolution. From 
now on, we will consider any abelian group G to be equipped with the 
conning measure, which leads to the dehnition 

f *g{x) = ^f{y)g{x -y) 

y&G 

for any functions f,g:G^C for which the above expression makes sense 
(i.e. is absolutely convergent; we will use it mostly for f,g being indica¬ 
tor functions of hnite sets). Having this notation, we can restate Pollard’s 
theorem as 

min(lA * ^ min(|A| • |R|, t ■ min(p, |A| -|- \B\ — t)) 

xGT^Ip'Z 

for any prime p and sets A,Bg 'L/p'L. It is not hard to prove the corre¬ 
sponding statement for subsets of the integers (and even easier to deduce it 
from Pollard’s theorem); in particular, for a single set A C Z and an integer 
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0 ^ t ^ l^l we have ^ ^(2|v4| — t). One can also 

prove (or deduce from Vosper’s theorem |Vos56] . a corresponding statement 
for Z/pZ) that the only sets for which we have the equality in the above 
inequality are arithmetic progressions. 

Our goal is to extend this structure theorem to be able to recognize 
sets y4 C Z satisfying ^^ggmin(l^ * l^(a;),f) ^ (2 + 5)t|A| for suitable 
ranges of parameters t and 6 as those that can be almost entirely covered 
by an arithmetic progression. Specihcally, for \ 0 we can pick 6 as big as 

2—0( 1 ^). Note that we cannot expect all of A to be covered by an arithmetic 
progression: a simple counterexample is an arithmetic progression with one 
extra point as far away as we like. The reader can check that for that set 
and f, sufficiently large the parameter 5 can be as close to 0 as we like, 
yet our set cannot be covered with an arithmetic progression of bounded 
length. 

In the next sections we use this result to slightly modify the regularity 
lemma proven by Green and Morris in [GMlSj . which allows us to improve 
the estimates on the probability that a sumset of a random subset A of 
natural numbers misses at least k elements. More precisely, we will show 
not only that the sequence pk = 2^/^ • P(|N \ (A + y4)| ^ k) is bounded, 
but also that it is increasing (and therefore convergent) along indices of the 
same parity (i.e. odd or even). 

Before proceeding, let us state precisely the results to be proven. 

Theorem 1.1. Let S G Z be a set of size N > 0 and let t be a positive 
integer. Suppose that 

min(l5 * ls{x)A) ^ (2 + 6)Nt, 

for some 5 > 0. Then there is an arithmetic progression P with of length at 
most (l + 2(5)A^ + 6f containing all but at most ^ points of S, provided that 

^ -L M < 1 
" ^ AT 4- 

Theorem 1.2. Let A C M 6e a set chosen randomly by picking each element 
ofN independently with probability i. Define a seguence {pk} via 

Pk = 2^/2 .p(|iq\ + ^ 

Then the subseguences {p 2 k} o.nd {p 2 k+i} are both increasing and bounded 
and therefore convergent. In particular F(\N\ {A + A)\ ^ k) = 0(2“^/^) and 
the implied constants can only oscillate between c and c\f2 for some c > 0 
as fc —)■ oo. 


2. Wrapping argument 

In this section we start proving Theorem ll.il with similar methods that 
Lev and Smeliansky used in |LS95j to prove (a generalisation of) Freiman’s 
3fc —3 Theorem. More precisely, their hrst step was to wrap the set S modulo 
q := (max 5 — min S') and consider a subset of a hnite group instead. Note 
that since the sets (S + min S) and (S + max S) share only one element, this 
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wrapping procedure results in a huge decrement in the doubling constant. 
In our situation taking just two endpoints would be too careless to achieve 
good results; luckily we can still hnd two points near the ends such that 
wrapping modulo their difference gives us what we need. 

Proposition 2.1. Let S G 7^ be a finite set and suppose that N := IS"! >0. 
Let t be a positive integer satisfying 2t < N and suppose that 

min(ls * 15 ( 0 :) ,t) ^ {2 + 5)Nt 

for some <5^0. Then there exist a positive integer n and an integer x 
and such that the set S" = (S' Pi [x,x + n)) (mod n) (the image of the set 
Sn [x + n) under the projection (mod n )) satisfies the following conditions: 

• \S'\ ^ N-2t, 

• ExGZ/nzmin(l '5 * ls(x),t) ^ (l + 25 + |) Nt. 

Let us make the remark that in the final statement the parameter S 
comes with coefficient 2, which leads to some limitations in the statement 
of Theorem [a such as 5 < We believe that this argument can be 
performed in the way that would give the coefficient 1, which would extend 
the range of 5 up to | and consequently allow us to prove the corresponding 
statement in a hnite group of prime order using similar methods (for that 
we need to let 5 > 6 — 4\/2 > 

Proof. Divide S into three subsets A, B, C with |y4| = \C\ = t, \B\ = N — 2t, 
max A < mini?, maxi? < minC* (intuitively they are the left, middle and 
right part respectively). Let /, g, h be the corresponding indicator functions 
(i.e. / = 1^, g = 1b, h = Ic). Substiuting it to the convolution we get 

= if + g + h) * if + g + h) ^ 2{f * g + g * h), 

where the inequality comes from discarding some of the positive summands. 
Note that since max(y4 + i?) < max(i? + C), the functions f * g and g * h 
are supported on disjoint sets and therefore the above implies the following: 

5Zmin(2/*^(x),t) + 5^ min( 25 f * h{x),t) ^ 

e min(l 5 =t= l 5 (x), t) ^ (2 + 6)Nt. 

xGX 

Now we use an easy to check inequality ^ t{2s — min(2s, t)), valid for all 
real numbers s. Speciffically, we substitute s = f * g{x) and s = g * h{x) for 
all X G Z and add them together to get 

+ ^ t{4:t{N-2t)-{2 + S)Nt) = {2-6)Nt‘^-St^. 

xG'Z x^Tj 

Here we also used the previous estimate and the formula for the sum 
'n,x& f * gi^) = g * ~ “ 2t). Note that since each x G Z can 
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be written in exactly f * g{x) ways as a sum x = a + b for a E A, b E B 
(and similarly for g * h), the above expression is in fact equal to 

* g?{x) + * hYix) = / * c/(a + 6) + ^ ^ ^ * /i(6 + c). 

xez xez aeA beB cgC beB 

By choosing elements a E A and c E C for which the inner sums are above 
average, we can see that 

/ * g{a + &) + g * h{b + c) ^ (2 — 5)Nt — 8t^ 

beB beB 

for some a and c (as |2l| = \C\ = t). Since f * g and g * h are bounded by 
both Is * Is and t, we actually proved that 

min(ls * ls(a;), t) ^ (2 — 5)Nt — 8t^. 

x£{a+B)U{c+B) 

This is just enough for us do dehne the wrapping procedure. Indeed, pro¬ 
jection modulo (c — a) merges the sets a + B and c + B into a single copy of 
B, on which sum of the values of the above function cannot exceed t\B\ = 
t{N — 2t). After a short calculation {2 + 5)Nt—{{2 — 5)Nt — 8t^)+t{N — 2t) = 
(1 -|- 26)Nt -|- we see that the set S' for a; = a and n = c — a satishes the 
inequality 

min(ls' * ls'(a;), t) ^ (1 -|- 25)Nt + 6t^. 

X&, 

□ 

After proving this our goal is to show that the set S' is close to a coset 
of a subgroup of "L/riL which would correspond to a progression in Z. We 
will deal with that problem in the next section. 

3. Popular doubling less than | 

The next step of the proof by Lev and Smeliansky was to use Kneser’s 
Theorem stating that for any hnite subsets A, B of an abelian group G the 
subgroup of all elements h satisfying A + B + h = A + B has cardinality 
at least |A| -|- \B\ — \A + B\. The proof of that theorem requires checking 
a lot of scenarios and it is not clear how one could modify it to work for 
popular sums. On the contrary, the proof of a weaker statement that if 
|A| = \B\ = N and \A + B\ < then A + B is a coset of a subgroup is 
much easier and, as it turns out, generalisable to our setting. Specihcally, 
in this section we will proceed towards the following statement. 

Proposition 3.1. Let G be an abelian group and let A, B <Z G be sets of size 
N > 0. Let t,ri > 0 be two real numbers satisfying the ineguality V + ^ 

Moreover suppose that the following inequality holds: 

^min(lA * Ib{x)A) ^ (1 + rfjNt. 

x&G 

Then there exists a subgroup H ^ G and cosets Ga and Gb of H satisfying 
the following conditions: 



A STRUCTURE THEOREM FOR SETS OF SMALL POPULAR DOUBLING 


5 


• |i7| ^ (1 + ri)N, 

. \A\CA\ + \B\CB\<t. 

Note that in this section we do not require t to be an integer anymore. 
Let us also remark that we only need the above statement for the case 
A = B, but the proof of the more general case is not much harder so we 
decided to include it here. For convenience of the reader we split it into 
several lemmas. First of all we want to find the subgroup H. Although the 
statement of the following lemma appears to be new in the literature, the 
methods going into the proof were used in |Fon77] . 

Lemma 3.2. Let G be an abelian group and let A,B G G be sets of size 
N > 0. Let t,ri > 0 be two real numbers satisfying the ineguality 1] + ^ 

Moreover suppose that the following ineguality holds: 

'^mm{lA*lB{x),t) ^ {l + ri)Nt. 

xeG 

Then there exists a subgroup H ^ G satisfying the following conditions: 

• 1a* ^ (1 — p)N and * 1_b(x) ^ (1 — ri)N for all x ^ H, 

• 1a * 1-a{x) < 2t and 1 b * 1-b{x) < 2t for all x ^ G \ H. 

Before we proceed, let us observe that the triangle inequality for sym¬ 
metric difference of sets |17AhF| ^ |t/Al/| -|- |17AhF| can be rearranged as 
|17 n 1/| -I- |t/ n hF| ^ |17| + \V n W\. We will frequently use a variant of 
this inequality, namely the assertion that \U O (V — v) \ + \U A {W — tc)| ^ 
\Lf\ + \V A{W — w+ v)\ for various choices of n, w. We will refer to all kinds 
of this statement as triangle inequality. 

Proof. Let us start with the set D = {x E G : 1 a*1b{x) ^ t}. Note that 
D contains most of the sums a + b] more precisely we have 

#{(a, b) E Ax B a + b E D} = ^ 1a * 1b(x) = 

x^D 

= -L y^max(lA * 1 b(t) — t, 0) = 

xGD xGG 

= t\D\ + 1a * 1_b(x) — min(lA * ls(x), t) ^ t\D\+ — (1 + r])Nt. 

xGG xGG 

Now let H = {x E G 1 a * 1-Aix) ^ 2t}. We would like to show that for 
any h E H we have 1^ * 1 _b(x) ^ (1 — ri)N. Let us start by noticing that 
for any a E \AA {A + h)\ we have 

\BA{B + h)\ ^ \{B + a)nD\ + \{B + a + h)nD\-\D\ 

by triangle inequality. Now let / : G —>■ [0,1] be an auxiliary function 
supported on A fl (A -|- h) and satisfying the condition J2xgg /(^) ~ 
(there exists one by choice of h). Multiplying the above inequality by /(a) 
and adding them together we get 

2t\B n (5 + h)| ^ 5^(/(a) + /(a - h))\{B + a) A D\ - 2t\D\. 

aeA 
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Now we combine the inequalities 

^2|(5 + a)nD| ^2t\D\ + 2N^ -2{l +7])Nt, 

aeA 

- /(a) - f(a - K))\(B + a) n B| < \B\ - /(o) - /(o - h)) = 

aeA aeA 

= N{2N - it) 

to get 

|r n (27 I /-.)U ^ ~ ^ ~ ~ ~ 

2t 

= (1 - r])N. 

In similar way we can prove that the inequality Is * 1-b{x) ^ 2t implies 
1a * 1-A ^ (1 — ri)N. Since (1 — ri)N ^ 2t, we have just constructed the 
set H satisfying all the postulated inequalities. The only thing remaining 
is to show that is a subgroup. This follows from triangle inequality: if 
hi, ^2 G H, then 


\An{A + hi- hs)! ^ \An{A +hi)\ + \An{A + ha)! - |A| ^ 

^ 2(1 - r])N - iV = (1 - 2r])N ^ 2t 

and consequently hi — ^2 G ih. □ 

Let us now turn for the moment to some estimates of the expressions of 
the form min(F(a;), t). 

Lemma 3.3. Let G be an abelian group and let F ■. G ^ [0, M] be a function 
satisfying Yhxec^^^) ^ Then for any t G [0,M] we have 

^min(F(a;),t) ^ 

xeG xeG 

Proof. Notice that j^F{x) ^ min(t,F(a;)) for each individual x. □ 

Corollary 3.4. Let G be an abelian group and let F : G — )■ [0, +cxd) be a 
function satisfying J2xeGF{x) < oo. Then for any f' > t > 0 we have 

^min(F(a;),t) > ^ min(F(a;), T). 

xeG xeG 

Proof. Just use the above lemma for min(F(a;), t'). □ 

Corollary 3.5. Let G be an abelian group and let A, B G G he sets of size 
> 0. Let f, ?7 > 0 be two real numbers satisfying the inequality rj + T ^ 1. 
Moreover suppose that the following inequality holds: 

^min(lA * lB{x),t) ^ (1 + r])Nt. 

xeG 

Then for any t' > t we have 

min(l^ * lB{x),t') ^ (1 + r])Nt'. 

xeG 
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Let us go back to our considerations. We have already constructed a set 
D containing most of the sums a + b and the subgroup H satisfying certain 
inequalities. Now it is time to construct a coset C of H containing most of 
the sums a + b. We will do it in two steps: hrst we show that C contains 
just enough sums to perform calculations quite accurately, which in turn 
will give us the cosets Ca and Cb with desired properties. 

Lemma 3.6. Let G be an abelian group and let A, B + G be sets of size 
N > 0. Let t,r] > 0 be two real numbers satisfying the inequality p+ ^ 
Moreover suppose that the following inequality holds: 

^min(lA * Ib{,x)A) ^ (1 + rj)Nt. 

xeG 

Then there exists a coset G of a subgroup H satisfying \G\ ^ (1 + ri)N and 
if{{a,b) e Ax B : a + beG} > 

Proof. Let H be the subgroup constructed in Lemma [32J We want to trans¬ 
late H to make it contain most of the sums a -|- 6, so a reasonable choice is 
to take xq E G for which * 1 b(xo) is maximal and set G = H + xq. Let 
k = N — 1a* l_B(a;o). Note that by previous considerations for any t' < 
we have ^^ggmin(lA * Ib{x)A') < = Y.xeG^A* 1 b{x), which implies 

1a* 1b(2^o) ^ or in other words k ^ Moreover, by triangle inequal¬ 
ity we have | 1 a * 1 ^( 2 : + xq) — 1a * 1-a(t)| ^ k; in particular G contains 
the set G' = {x E G : 1a * 1b{x) ^ 2t + k}. Therefore we are interested in 
the size of the set #{(a, b)EAxB\ a + bE G'}. By the same calculations 
as for the set D in the proof of the Lemma 13.21 we know that 

#{(a,6) EAxB:a + bEG'}^N^- {1 + r])N{2t + k) + {2t + k)\G'\. 

Here we have also used here the previous corollary with t' = 2t + k. Let us 
bound the size of G' from below. We know that 


= ^1a * 1b{x) = 
xeG 

= y^ min(lA * lB{x),2t + k) + ^( 1 a * 15 ( 2 ;) - {2t + k)) ^ 

xGG x£G' 

^ (1 + 7])N{2t + k) + |C"|(iV -2t- 2k), 


which rearranges to \G'\ ^ • Substituting this into the pre¬ 

vious bound we get 


if{{a,b) E Ax B : a + b E G'] ^ 


N{2t + k){rjN — (1 -|- ri)k) 
N -2t-2k 


It is easy to check that the expression on the right hand side is a decreasing 
function in t, so we can substitute t = (1 — ri)N to get 

A^((l — 2r])N + k){rjN — (1 + ri)k) 
2{rjN — k) 


if{{a,b)EAxB:a + bEG'}^N‘^ 
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Now it is also easy to check that this being greater than is equivalent 

to the inequality {rjN — kY + (1 ~ 2r])riNk + > 0, which is true because 

we assumed rj > 0. 

Now we only need to bound the size of C; because of triangle inequality 
it is contained in the set C" = {x G G : ^ (1 — ~ so 

using the corollary with t' = {1 — t])N — k > t we get 

t'\C\ ^ t'\C''\ = min(lyi * lB{x),t') ^ 
xec" 

^ ^min(lA * ^ (1 + ri)Nt'. 

x&G 

Thus we have proved both desired inequalities. □ 

Now we are ready to prove Proposition 13.11 

Proof of Proposition \S.1[ Let H and C be as in previous considerations. By 
averaging argument, there exist a & A, h ^ B with 1(^4 + 6) fl C"! > 
and |(i? + a) n C"! > Dehne Ca = C — h and Cb = C — b. Now let 
us estimate the sum of the expressions min(lA * 1 _b(x), f) separately on and 
outside Ca + Cb- To do this, let f,g,h:G^ [0,1] be auxiliary functions 
supported on Cb, G \ Cb and G \ Ca respectively, satisfying f,g ^ 1^, 
h ^ 1 a and the conditions 

= t, 

x£G 

'^g{x) = mm{\B\CB\,t), 

x£G 

^h{x) = min I\A\CA\,t-^g{x) 

xdG \ xGG 

Now we have the following estimates: 

mm{lA * lB{x),t) ^ Y ^ahGa * /(^) > + v)Nt ^ 

Y min(lA * lB{x),t) ^ Y (IaoCa * 9{x) + IsnCs * > 

x^Ca~\~Cb x^Ca~\~Cb 

x&G 

Comparing that to the initial estimate ^^ggmin(l^=t: Ib{.x)A) ^ {l + g)Nt 
we see that YhxeG^d^^) + ^(^)) < which is only possible if YhxeGd^^) ~ 
\B \ Cb\ and J2xeG ~ 1^ \ Therefore \A \ Ca\ + I-B \ Cb\ < t. 

To hnish the proof, notice that |if| = |C|^(l + 77)iV. □ 

Before proceeding, let us make the remark that in fact the larger the 
subgroup H is, the less points of A and B are allowed to lie outside Ca 
and Cb respectively. One can try to perform even more precise calculations, 
using the fact that now we know that actually for all x G C = Ca + Cb we 
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have 1a*^b{x) ^ t. However we do not need it that much so we leave the 
result as it is. 


4. Completing the proof 

Having proved the results in the previous two sections, we are ready to 
prove Theorem 11.11 

Proof of Theorem li.il Suppose that we have the set S that for which the 
inequality ^^ggmin(l 5 * 15 ( 0 :), i) ^ (2 + S)Nt holds. We use Lemma l2T] 
to obtain a set S' C 'L/riL of size at least N — 2t satisfying the inequality 
^^g 2 min(l 5 / * 15 /( 0 ;), t) ^ (1 + 26)Nt + 6 i^. We see that the assumptions 
of Proposition 13.11 are satished with A = B = S' a.s long as 

1 ^ (l + 25)A^ + 6 i-|5'| t (l + 25)A^ + 7i-|5'| 

2 ^ \^\ ^W\~ W\ ’ 

in other words 3|S"| ^ (2 + 45)iV + 14t, which is certainly true if 5 + ^ 

Propsition l3.1l then tells us that the set S' is essentially contained in a coset 
of a subgroup of size at most {1 + 26)N + 6t, with the exception of at most 
I points. Unwrapping the situation back again, we see that the set S has 
all but at most ^ elements contained in an arithmetic progression of length 
at most (1 + 25)N + 6t. □ 

5. Regularity and gounting sets with small sumset 

This section is devoted to a lemma of Green and Morris on counting 
subsets of a cyclic group of prime order satisfying certain bounds on the 
size of the subset. Unfortunately we cannon just quote their result, as we 
need a slight modihcation of it. Therefore we need to move back to the 
statement of the regularity lemma, or more precisely, to |GM151 Theorem 
2 . 1 ], stated below. 

Lemma 5.1 (Green-Morris, regularity lemma). For every £ > 0, there 
exists S = 6{e) > 0 sueh that the following is true. Let p > po{£) be a 
sufficiently large prime and let A C Z/pZ be a set. There is a dilate A* = XA 
and a prime q, -jo ^ ^ ; such that the following holds. If A* = 

A*nli{q) for eachi E'L/q'L then, for at least {l — e)q'^ pairs {i,j) € ifLlqEl)^ 

min(|H*|, \A*\) ^ ep/q or \A* + A*\ ^ {2 - e)p/q. 

Here we adopted the notation/i(q') = {x E Z/pZ : x/p G [i/q,{i + l)/q)}. 
Note that [i/q, {i + l)/q) + [j/q, (j + 1)^) C [(z + ])/q, {i + 3 + 2)/q) as 
subsets of R/Z; intersecting those sets with 'Ljq'L embedded in R/Z in a 
natural way, we get the inclusion Jj -|- Ij C U+j U li+j+i- 

It is now time to prove some bounds on the number of sets having hxed 
size and whose sumset has also hxed size. We cannot improve the bound 
given by Green and Morris; instead we will introduce a better bound for the 
number of exceptional sets and at the same time use our result to prove that 
every non-exceptional set has certain structure. Specihcally, we will proceed 
towards the proof of the following statement. 
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Proposition 5.2. Let 5 > 0 and N > No{6) be a large natural number. 
For every fc, m G N satisfying 6N ^k^N, 2k — 2N — 1 the 

following statement is true. The family of all subsets X C N} with 

\X\ = k and |X + X| =m can be divided into two classes; one of them, the 
class of exceptional subsets, has cardinality at most , and each 

non-exceptional set (member of the other class) is almost contained in an 
arithmetic progression P of length at most (1 + 12005)^, so that we have 
\{X + X) \ {P + P)\ ^ 246N. 

For the definition of the function H, see the appendix. 

Before we proceed, let us make a few remarks. Firstly, our result is only 
valid in subsets of integers and not in the cyclic groups of prime order; the 
reason for that is Theorem 11.11 is of the same kind. Secondly, even in that 
case it does not improve the estimate of Green and Morris on the number of 
all subsets with |X| = k and |X + X| = m; however the additional structure 
allows us to prove Theorem 11.21 

Since we are following Green and Morris argument, we will also need 
Pollard’s Theorem. It has already appeared in the introduction, but let us 
state once again in a somewhat more precise form. 

Theorem 5.3 (Pollard). Let p be a prime number and let A,Bg 'L/p'L be 
two sets. Let t be an integer satisfying 

max(0, |A| + |i?| — p) ^ t ^ mind^dl, \B\). 

Then the following ineguality holds: 

mm{lA*lBix),t) '^t{\A\ + \B\-t). 

X&ijpL 


Also, since Theorem II. P refers to the subsets of integers, we have to make 
it more compatible with the regularity lemma, which refers to the subsets 
of a finite group. To link those two statements, let us prove the following 
lemma. 


Lemma 5.4. Let p be a prime and let P,Q G Z/pZ be arithmetic progress- 
sions satisfying \P\ ^ | and |P fl Q| ^ -^ + 1. Then the set P (iQ is an 
arithmetic progression with the same common difference as Q. 


Proof. By dilating if necessary, we can assume that P is an interval (i.e. 
the common difference of P is 1). Since |PnQ| ^ -^ + 1, we know that 
there are two consecutive elements of Q that belong to P. That means that 
the common difference of Q is less than the size of P; let us denote it by 
d. Suppose for the sake of contradiction, that the intersection P fl Q is not 
a progression of common difference d. In other words, if we look at the 
elements of Q in order, we see at least two separate groups of elements of 
P r\ Q with at least one element of Q \ P in between. Since the common 
difference of Q is d, each group of elements of P fl Q has cardinality at most 


LG 

d 


and each group of elements of Q \ P (maybe except those containing 








A STRUCTURE THEOREM FOR SETS OF SMALL POPULAR DOUBLING 11 


the endpoints) has cardinality at least 
then 


p-\P\ 

d 


. Also, if we dentote I = 


d 


P 


SO in fact 


d 

p-\P\ 

d 


> 


P 


d 


1 > 


3|P| 


d 


1 > 3 


d 


4 = 3/ - 4, 


^ 3/ — 3. By d < |P| we know that I ^ 2. Now denote by 


k ^ 2 the nnmber of gronps of elements of PflQ, we see that the nnmber of 
gronps of Q \ P not containing the endpoins is A; — 1. By assnmption Q\P 
has at least 2 elements less than P HQ, which leads to the ineqnality 

kl^\PnQ\^\Q\P\+2^ {k- 1)(3/ - 3) + 2. 

Rearranging gives {2k — 3) (2/ — 3) ^ —1, which is impossible since both of 
the factors are positive. □ 

Now we are ready to prove Proposition 15.21 


Proof of Proposition 15.^1 Snppose that h > 0 is a snfficiently small constant. 
Let N > No{6) be a large natural number and let p G [8A^, 16A^] be a prime. 
We consider each subset of [N] to be a subset of Z/pZ via the natural 
embedding [A^] Z/pZ. Then for a subset A C [A^] with |A| ^ 6N we 
can use the regularity lemma with £ = 2“^h^ to obtain a dilate A* = XA, a 
prime number q and a corresponding partition A* = A*n/j. We can assume 
that Nq is so large that it forces q ^ S^p. We know that for at least {l — e)q‘^ 
pairs {i,j) G {'L/qTXf', either 


min(|A*|, |A*|) ^ eL or \A* + A*\ ^ (2 - e)L, 


where L = p/q. Now let S = {i E Z/qZ : \A*\ > eL}. We will call the set 
A exceptional if 151 ^ — 26)^ and non-exceptional otherwise. Now we 

need to check that those two classes actually satisfy postulated properties. 

The number of exceptional subsets can be estimated as follows: hrst we 
choose a prime q ^ h^p, then we choose a set 5 C ZjqZ of size at most 
(| — 26)^; there are at most 2*? ways of doing that. Having chosen S, we 
specify A* by choosing A* fl S' and A* \ S', where S' = We take 

into account that |A* \ 5'| ^ ep to get that the number of choices of A* is 
bounded by 



k-j 


The bound |5'| ^ — 25)(1 + ^)m comes from the fact the S' is a union of 

at most {^ — sets of size at most | + 1- Since A is a dilate of A*, the 
bound for the number of exceptional subsets is p times the above quantity. 
Using the estimates from the appendix, we get the claimed bound. 

Now let us turn our attention to non-exceptional sets. Suppose that for 
a set A the regularity lemma gave us a set S of size |5| > (| — 2,6)^. 
We would like to show that S is Freiman 2-isomorphic to a set of integers 
and satishes the assumption of Theorem 11.11 To prove the former, note 
that by dehnition of e-regularity there has to be at least one pair i,j with 
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\A* + A*\ ^ (2 — e)p/q. The set A* + A* is contained both in Jj+j U h+j+i 
and in A* + A* C {2A, 3A,..., 2A^A}. They are progressions satisfying the 
assumptions of Lemma 15.41 in that case we can argue that the intersection 
(Jj+j U n{2A, 3A,..., 2A^A} is an interval of length at least {2 — e)p/q. 

Now let A' be an integer less than | in absolute value and satisfying the 
congruence XX' = 1 (mod p). It is easy to see that 


|A'|^ 


2N -2 
(2 - e)p/q - 1 



since we have a progression of length (2 — e)p/q and common difference A' 
contained in the interval [2,A^]. Note also that if z G S', then the interval 
li has nonempty intersection with {A, 2A,..., iVA}. Therefore X'li intersects 
{1, 2,..., iV}, but X'li is itself contained in an interval of length \X'\p/q ^ 
This interval has to be contained in [—|, N+^] C [—|, y] by the intersection 
property. Therefore |A'|z G [—y y] as an element of Z/gZ and so all of S 
has to be contained in an arithmetic progression of length less than as 
required. 

We also need to get the bound for min(l 5 * lg{y),t) to be in 

position to use Theorem 11.11 To do that, set t = [2~^S'^q\ and let T be the 
set of all y G Z/gZ for which there exist i,jeS with |74* + y4*| ^ {2 — e)p/q. 
The sumset A* + A* is contained in Jjy + /jy+i, which allows us to write 


^ |(A + /l) n (4u 4„)| = 

^ ysT 

= ^ \{A + A)r\iy\+ ^ |(74 + 74)n4|^ 

yeT+{0,l} ?;GTn(T+l) 



Multiplying the above by ^ we get (2 — e)\T\ ^ |T fl (T + 1)|(1 + |) + 
which leads to 


|T + {0,1}| =2|T| - |Tn(T + l)| ^ 


g(m+ |Tn (T + 1)1) 

p 


+ e\T\^ 


q q{m + 6^p) (1 + 6)mq 

^ -{m + g + ep) ^ ^ -. 

p p p 


This will be more useful later, but at the moment the most important 
thing for us is that |T| ^ ii+s)mq ^ other hand, every y G {^/ql) \ T 

corresponds to ls*l 5 (z/) “bad” pairs (z,j) , for which min(1^4*1, |A*|) > ep/q, 
yet \A* + A* \ < (2 — e)p/q. The number of those does not exceed eq^ and 









A STRUCTURE THEOREM FOR SETS OF SMALL POPULAR DOUBLING 13 


therefore we have 

j/eZ/gZ y£T y^T 

p V mt / 

Combining the inequalities t ^ m ^ 2~^Sp and e = we can 

bound the above by (1 + 25)^. Recalling that \S\ ^ — 25)^, we see 

that we can use Theorem II.II if S is sufficiently small. Indeed, we can rewrite 
the bounds as 

min(lg * ls{y),t) ^ ^2 + t\S\. 

yGZIqZ ^ ^ 

Since ^ if) we can argue that all of S', perhaps except ^ elements, is 
contained in a progression Q dTLjqL of length at most |S'|(1 + 25h) + 6t. 
Now we use the assumption on |T + {0,1}| to say something about the 
common difference of Q. By Pollard’s Theorem applied to the set S (iQ we 
know that 

Y min(l 5 * ls{y),t) ^ f(2|S' n Q| - f) ^ t{2\S\ - 6t). 

y(zQ+Q 

Subtracting off those elements of Q + Q that are not in T, we get 
Y min(l5 * lsiy),t) > t( 2 |S'| - Qt) - eq^. 

2 /G(Q+Q)nr 

This means that we have \{Q + Q) nT\ ^ 2|S'| — 6 t — or in other words 
|(Q+Q)\T| ^ 2(255|S'|+6t)+6t+^ = 505|S'| + 18t+^. By the estimate we 
already know min( 15 * 15 (?/), t) ^ (1 + 5)^ and Pollard’s Theorem 

we see that 2 |^| - t^{l + 6)^ ^ so 2 |^| ^ Also, 18t ^ 18^ 
and ^ ^ , which altogether gives \{Q + Q)\T\^ Now we will 

examine how the set Q + Q behaves under addition of (0,1}. The elements 
of Q + Q inside T will form a subset of T + {0,1} of size at most . The 

part outside T will get at most doubled, so it will have size at most 
Therefore |Q + Q + {0,1}| ^ Q + Q is a progression of length 

at least 2|S'nQ|— 1 ^ 2|S'|—5f—1 ^ ^(1—45—6(5) = and therefore 

adding (0,1} to it produces at most new elements. By dilating, we 

can assume that Q is an interval and the set we are adding is (0, d} for 
some d. But then we see that d + Now consider the dilation by d 

inside "Ljp'L] then the intervals Q for y & Q become progressions of common 
difference d, and Q corresponds to an “interval” of such progressions. Their 
union is almost an interval itself — the only problem being near endpoints, 
where the progressions do not necessarily start at the points we like. We 
can compensate this by adding at most | points for each “residue class 
(mod d)” to get a genuine interval. The quotation marks mean that we are 
















14 


PRZEMYSLAW MAZUR 


working (mod p), so technically we cannot consider residne classes modnlo 
other nnmbers, bnt we have proved that we use only half of the space, so 
we can pretend we work in the integers. In the end we get an interval P of 
length at most (|Q| + 2d){^ + 1) containing almost all of A; estimating that 
gives 


(IQI 


2d)i- + l) ^ 
Q 


mq 1 + 201h 

V 2 


+ 422(5)(l + 5^ 


p (1 + 12005)m 




Now note that the elements oi A + A outside P + P are at worst the ones in 
ly with y ^ Q+Q. Each y E T corresponds to at least (1—£)| —1 elements of 

A+A, so since \ {Q+Q)nT\ ^ 2\S\-6t-^ ^ ^{1-46-66-6) = we 

see that the intersection {A + A)n{P+P) has at least — e:)| — 1) ^ 

(1 — 126)m elements. That means that | (A + A) \ (P + P) | ^ 126m ^ 245iV, 
as claimed. □ 


6. Proof of Thforfm 11.21 

In this section we use the prevous results to get Theorem 11.21 first let us 
note that it can be easily reduced to the following statement. 

Proposition 6.1. There exist absolute constants Cq, Eq, such that the 
following is true. Suppose that A G N is a set chosen randomly by picking 
each element ofN independently with probability Then for every k > ko 
we have the inequality 

P(|N \ {A + A)\^ k and 1 e A) ^ Co(2 + 

Comparing this to what we are trying to prove in the end, this statement 
says that for a set A satisfying |N \ (A + ^4)1 ^ A; it is exponentially (in k) 
unlikely to contain 1. 

Proof of Theorem \1.S\ assuming Proposition lh.il Let A C M be a random 
subset. Note that the conditional distribution of A on the event 1 ^ A is 
exactly the same as the initial distribution of ^4 + 1. This, and the fact that 
P(1 G A) = i, allows us to write (for each k ^ 2): 

pi,-pi,_2 = 2^/^ -FdNXiA + A)] ^ A;)-2(^-2)/Pp(|N\(Al + Al)| ^k-2) 

= 2^/2(P(|N\(Al + Al)| ^ A;)-|P(|M\((Al + l) + (Al + l))| ^ k) = 

= 2 ^/ 2 (p(|pj\ + ^ A;) -P(|N\ (yl + y4)| ^ k and 1 ^ A)) = 

= 2^/2 -PdNV (y4 + y4)| ^ A: and 1 e A). 

The above quantity is obviously nonnegative, which makes both sequences 
{P 2 k} and {p 2 k+i} increasing. Proposition 16.11 allows us to say they are 
bounded. Indeed, if A; > fco, then 

Pk-Pk -2 = 2'=/2 + ^ A; and 1 G A) ^ 

/ 9 \ 

^ 2^/2 . ^^(2 + = Co • 
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Let A = < 1. Summing the above inequalities we see that for any 

k > ko we get 

Pk -Pko^ 2^ CqX = ^ ^ < oo. 

s=fco/2 

Actually the above is true only for k of the same parity as /cq; for the 
remaining values we simply replace all instances of fco with k^ + l. □ 

For the rest of this section we will focus on proving Proposition 16.11 


Proof of Proposition IKT[ Let 5 be a small quantity and let k > Nq{5) from 
the statement of the Proposition 15.21 Consider the set X = An [10/c]. First, 
following Green and Morris, let us estimate the probability that A misses one 
of the elements greater than lOfc. For each such element m we have at least 
\_y\ pairs of (not necessarily distinct) natural numbers u, v with u + v = m; 
the probability that G A is at least A. Therefore the probability that 
m ^ A +A is bounded by (|)(”^“^)/2 and the total contribution for numbers 
greater than lOfc is at most 


m=10k-\-l 

We consider this a quantity less than C'o(2 + eo)~^'^^; our goal is to divide the 
set of admissible events into classes with probability of each being bounded 
by this expression. 

From this point on we can assume that A contains all the numbers greater 
than lOfc and consequently N \ (A + A) = [lOfc] \ {X + X). Let us estimate 
the probability that |X| ^ 105/c; since X is uniformly distributed among all 
subsets of [10/c], we can estimate it by . If 6 is small enough, this 

implies the claimed bound. 

Now assume that |X| > lf)5k and estimate the probability that X is 
exceptional (according to the statement of Proposition I5.2|l and obeys the 
inequality |[10fc] \ (X + X)| k. This is bounded by 




2-IOA: 



19k 


fc'>10(5/c m=2k' — 1 


The estimate m ^ 19fc comes from the fact that X P X misses at least k 
points from [10/c] C [20fc]. Bounding each term crudely, i.e. using H{x) ^ 1, 
m ^ 19/c, we get the bound of again as good as we need. 



If we restrict the range of m ^ (19 — 30(5)/c, the above expression divided 
by 2^^^ is still bounded as we need. Therefore we can assume that in fact 
|X + X| ^ (19-30(5)fc. 

Our bound gives us that the corresponding progression P has size at 
least (1 — 10(5)Y ^ (^ ~ 10^(5)fc. Therefore it has common difference 1. 
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Suppose now that X + X contains at least 2 ■ 10^5/c elements less than k/2. 
We know that {X + X) \{P + P) ^ 125m ^ 2405fc, which means that the 
least element of P is at most (1 — 190005)A;/2. On the other hand, since 
|p| ^ 1 - 1200 ^ ^ ^ _ 120005)fc, so there are at least 70005/c elements of 

(10/c, 20fc] not in P + P, which belong to A + A anyway. But out of them 
only 2405/c can belong to X + X, so that leaves over 60005fc elements of 
(y4 + y4) \ (X + X) in [20k]. Adding to that m ^ (19 — 305)fc elements of 
A + A we see that A + A in fact misses less than k elements, so this case 
cannot hold. 

Suppose now that A (equivalently: A) contains less than 2 ■ 10^5/c ele¬ 
ments less than k/2. So far we have not use the fact that IgA. We are going 
to do this now. Let B = An(0, C = An(^|^, k] and D = An(fc, -I-cxd). 
Clearly A + A D (1 -|- C) U (C -|- C) U (P -|- P) and those subsets are disjoint. 
So certainly |M \ (A -|- A)| ^ fc if 

\N\{D + D - 2k)\ ^ k - {2k - |C| - IP + Cl) = |C| + \C + C\-k. 

By Green-Morris estimate |GM151 Theorem 1.3] we can assume that the 
probability of the latter is bounded by C^{2 — for any e > 0 

and a suitable constant Cs > 0. Note that the probability cannot exceed 
1, so the trivial bound is actually better if |C| -|- jC -|- C| < k. Putting 
everything together, the total probability that |N\ (A-|-A)| ^ k is bounded 
by 

2-^/2 ^ ^-k/2 ^2 - + 

B C 

Now by Green-Morris bound on the number of sets with small s um set 
[GM15( Proposition 3.1] we can divide the inner sum into classes depending 
on the size of |C| and |C -|- C|. This way we get a bound of 

“ £)-('+”^-^)/ 2 ). 

B l,m^k ^ 

Now we have the upper bound on the size of P, which turns the outer sum 
into additional coefficient (S)/ 2 _ only way the above expression 

could fail to be bounded as we need is if we could hnd among the expressions 
• min(l, (2 — ig greater than (2 — r])^P for some 

small value of r]. This hrstly requires m to be close to k as the binomial 
coefficient ^ 2 ™/^ Pas to be at least (2 — Also, we need to have 

(2 — £;j-(^+™-^)/2 ^ — |)^/^, which requires / -|- m to be not much bigger 

than k. Those two in turn imply that I is small compared to k, in which case 
does not exceed (2 — This indicates that our initial assumption 
was false and our bound in fact is always satished. □ 

Appendix A. Estimates of binomial coefficients 

In the appendix we are going to prove some useful inequalities concerning 
binomial coefficients. Before we do that, let us dehne the binary entropy 
function P : [0,1] ^ M as H{t) = tfog 2 j -|- (1 — f) log 2 Note that it 
does not quite make sense if t = 0 or t = 1, so we extend H continuously 
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by setting H{0) = H{1) = 0. One can easily prove that 0 ^ H{t) ^ 1 for 
all t G [0,1] with the only extremal values being H{0) = H{1) =0 and 
= 1. Also, H is continuous, increasing on [0, |], decreasing on [|, 1], 
and obeys the equality H(t) = — By calulating the second derivative, 

one can see that H is concave, which together with previous observations 
leads to the triangle inequality H{\x + y\) ^ H{\x\) + H{\y\) valid for all 
X, y for which all the expressions make sense. 

Lemma A.l. Let n,k ^ 0 be integers and let 6 G [0, |] be a real number. 
Then the following inequalities are true: 



Proof. Let us begin with the last inequality. Take a set of size n and choose 
a subset of it at random by picking each element independently with prob¬ 
ability 6. Then for any j ^ Sn the probability that a given j-element subset 
is chosen is (5-^ (1 — 6)"'~L Since 5 ^ 1 — 5, we can bound it from below by 

But the total probability cannot exceed 1, so the number of all those sets, 
equal to Q)y has to be at most 

Turning to the hrst two inequalities, assume without loss of generality 
that fc ^ we can do that as we can always replace k with n — k, given that 
(2) = and H{^) = = H{^). In this case let 5 = ^ ^ i and 

consider again the same random experiment. Writing the total probability 
as a sum of probabilities of choosing a particular set, we get 

j=o 

Note that the expression is one of the summands in the above 

sum, and by comparing two consecutive ones we can check that is actually 
the largest out of n -|- 1 summands. Therefore ^ ^ as 

needed. □ 
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